52 research outputs found

    Asynchronous decentralized accelerated stochastic gradient descent

    Full text link
    In this work, we introduce an asynchronous decentralized accelerated stochastic gradient descent type of method for decentralized stochastic optimization, considering communication and synchronization are the major bottlenecks. We establish O(1/ϵ)\mathcal{O}(1/\epsilon) (resp., O(1/ϵ)\mathcal{O}(1/\sqrt{\epsilon})) communication complexity and O(1/ϵ2)\mathcal{O}(1/\epsilon^2) (resp., O(1/ϵ)\mathcal{O}(1/\epsilon)) sampling complexity for solving general convex (resp., strongly convex) problems

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    The role of the P-center in cortical tracking of speech

    No full text
    International audienc

    The role of the P-center in cortical tracking of speech

    No full text
    International audienc

    Mental representations of regional phonological variation in conversational interaction

    No full text
    C'est dans l'interaction sociale, lieu d'occurrence premier du langage parlé (Local, 2003) que la parole est apprise, qu'elle est produite quotidiennement et qu'elle évolue. De nouvelles approches interdisciplinaires de l'étude de la parole, notamment la sociophonétique ou les récents développements de l'interaction conversationnelle, ouvrent de nouvelles perspectives dans la modélisation du traitement de la parole. Une question centrale à cette entreprise est la caractérisation des représentations mentales associées aux sons de la parole. Pour traiter cette question, nous utilisons l'approche exemplariste du traitement de la parole, qui propose que les sons de la parole sont mémorisés en incorporant des informations contextuelles détaillées. Nous présentons une nouvelle tâche interactionnelle, GMUP (pour "Group ’em up"), destinée à recueillir les réalisations de matériel phonologique finement contrôlé produit par deux interactants dans un cadre expérimental écologiquement valide. Les variables phonologiques décrivent les différences existant entre deux variétés de français parlé, le français standard et le français méridional. Des outils de reconnaissance automatique de la parole ont été développés pour évaluer la convergence phonétique, observable de l'évolution des représentations mentales, à deux niveaux de granularité : au niveau catégoriel de la variable phonologique et au niveau plus fin, subphonémique. L’emploi de mesures acoustiques détaillées à grande échelle permet de caractériser finement les différences inter-individuelles dans l'évolution de la forme des réalisations acoustiques associées aux représentations mentales en interaction conversationnelle.It is in social interaction, the primary site of the occurrence of spoken language (Local, 2003) that speech is learned, that it is produced everyday and that it evolves. New interdisciplinary approaches to the study of speech, particularly in sociophonetics and in recent developments in conversational interaction, open new avenues for modeling speech processing. A central question in this enterprise relates to the caracterization of the mental representations of speech sounds. We address this question using the exemplarist approach of speech processing, which proposes that speech sounds are stored in memory along with detailed contextual information. We present a new interactional task, GMUP (which stands for "Group ’em up"), designed to collect realizations of highly-controlled phonological material produced by two interactants in an ecologically valid experimental setting. The phonological variables describe differences between two varieties of spoken French, Northern French and Southern French. Automatic speech recognition tools were developed to evaluate phonetic convergence, an observable of the evolution of the mental representations of speech, at two levels of granularity: at the categorical level of the phonological variable and at a more fine-grained, subphonemic level. The use of large-scale detailed acoustic measures allows us to finely caracterize interindividual differences in the evolution of the acoustic realizations associated with the mental representations of speech in conversational interaction

    Information-preserving temporal reallocation of speech in the presence of fluctuating maskers

    No full text
    How can speech be retimed so as to maximise its intelligibility in the face of competing speech? We present a general strategy which modifies local speech rate to minimise overlap with a known fluctuating masker. Continuous time-scale factors are derived in an optimisation procedure which seeks to minimise overall energetic masking of the speech by the masker while additionally unmasking those speech regions potentially most important for speech recognition. Intelligibility increases are evaluated with both objective and subjective measures and show significant gains over an unmodified baseline, with larger benefits at lower signal-to-noise ratios. The retiming approach does not lead to benefits for speech mixed with stationary maskers, suggesting that the gains observed for the fluctuating masker are not simply due to durational expansion. Index Terms: speech intelligibility, temporal modification, energetic and informal maskin

    Variation phonologique régionale en interaction conversationnelle

    No full text
    C'est dans l'interaction sociale, lieu d'occurrence premier du langage parlé (Local, 2003) que la parole est apprise, qu'elle est produite quotidiennement et qu'elle évolue. De nouvelles approches interdisciplinaires de l'étude de la parole, notamment la sociophonétique ou les récents développements de l'interaction conversationnelle, ouvrent de nouvelles perspectives dans la modélisation du traitement de la parole. Une question centrale à cette entreprise est la caractérisation des représentations mentales associées aux sons de la parole. Pour traiter cette question, nous utilisons l'approche exemplariste du traitement de la parole, qui propose que les sons de la parole sont mémorisés en incorporant des informations contextuelles détaillées. Nous présentons une nouvelle tâche interactionnelle, GMUP (pour "Group em up"), destinée à recueillir les réalisations de matériel phonologique finement contrôlé produit par deux interactants dans un cadre expérimental écologiquement valide. Les variables phonologiques décrivent les différences existant entre deux variétés de français parlé, le français standard et le français méridional. Des outils de reconnaissance automatique de la parole ont été développés pour évaluer la convergence phonétique, observable de l'évolution des représentations mentales, à deux niveaux de granularité : au niveau catégoriel de la variable phonologique et au niveau plus fin, subphonémique. L emploi de mesures acoustiques détaillées à grande échelle permet de caractériser finement les différences inter-individuelles dans l'évolution de la forme des réalisations acoustiques associées aux représentations mentales en interaction conversationnelle.It is in social interaction, the primary site of the occurrence of spoken language (Local, 2003) that speech is learned, that it is produced everyday and that it evolves. New interdisciplinary approaches to the study of speech, particularly in sociophonetics and in recent developments in conversational interaction, open new avenues for modeling speech processing. A central question in this enterprise relates to the caracterization of the mental representations of speech sounds. We address this question using the exemplarist approach of speech processing, which proposes that speech sounds are stored in memory along with detailed contextual information. We present a new interactional task, GMUP (which stands for "Group em up"), designed to collect realizations of highly-controlled phonological material produced by two interactants in an ecologically valid experimental setting. The phonological variables describe differences between two varieties of spoken French, Northern French and Southern French. Automatic speech recognition tools were developed to evaluate phonetic convergence, an observable of the evolution of the mental representations of speech, at two levels of granularity: at the categorical level of the phonological variable and at a more fine-grained, subphonemic level. The use of large-scale detailed acoustic measures allows us to finely caracterize interindividual differences in the evolution of the acoustic realizations associated with the mental representations of speech in conversational interaction.AIX-MARSEILLE1-Bib.electronique (130559902) / SudocSudocFranceF

    Effects of linear and nonlinear speech rate changes on speech intelligibility in stationary and fluctuating maskers

    No full text
    International audienceAlgorithmic modifications to the durational structure of speech designed to avoid intervals of intense masking lead to increases in intelligibility, but the basis for such gains is not clear. The current study addressed the possibility that the reduced information load produced by speech rate slowing might explain some or all of the benefits of durational modifications. The study also investigated the influence of masker stationarity on the effectiveness of durational changes. Listeners identified keywords in sentences that had undergone linear and nonlinear speech rate changes resulting in overall temporal lengthening in the presence of stationary and fluctuating maskers. Relative to unmodified speech, a slower speech rate produced no intelligibility gains for the stationary masker, suggesting that a reduction in information rate does not underlie intelligibility benefits of durationally modified speech. However, both linear and nonlinear modifications led to substantial intelligibility increases in fluctuating noise. One possibility is that overall increases in speech duration provide no new phonetic information in stationary masking conditions, but that temporal fluctuations in the background increase the likelihood of glimpsing additional salient speech cues. Alternatively, listeners may have benefitted from an increase in the difference in speech rates between the target and background

    Speaking to a common tune: Between-speaker convergence in voice fundamental frequency in a joint speech production task

    No full text
    International audienceRecent research on speech communication has revealed a tendency for speakers to imitate at least some of the characteristics of their interlocutor's speech sound shape. This phenomenon, referred to as phonetic convergence, entails a moment-to-moment adaptation of the speaker's speech targets to the perceived interlocutor's speech. It is thought to contribute to setting up a conversational common ground between speakers and to facilitate mutual understanding. However, it remains uncertain to what extent phonetic convergence occurs in voice fundamental frequency (F0), in spite of the major role played by pitch, F0's perceptual correlate, as a conveyor of both linguistic information and communicative cues associated with the speaker's social/individual identity and emotional state. In the present work, we investigated to what extent two speakers converge towards each other with respect to variations in F0 in a scripted dialogue. Pairs of speakers jointly performed a speech production task, in which they were asked to alternately read aloud a written story divided into a sequence of short reading turns. We devised an experimental set-up that allowed us to manipulate the speakers' F0 in real time across turns. We found that speakers tended to imitate each other's changes in F0 across turns that were both limited in amplitude and spread over large temporal intervals. This shows that, at the perceptual level, speakers monitor slow-varying movements in their partner's F0 with high accuracy and, at the production level, that speakers exert a very fine-tuned control on their laryngeal vibrator in order to imitate these F0 variations. Remarkably, F0 convergence across turns was found to occur in spite of the large melodic variations typically associated with reading turns. Our study sheds new light on speakers' perceptual tracking of F0 in speech processing, and the impact of this perceptual tracking on speech production
    corecore